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Abstract 

To control for multiscale effects in networks, one can transform the matrix of (in general) 
weighted, directed internodal flows to bistochastic (doubly-stochastic) form, using the iterative 
proportional fitting (Sinkhorn-Knopp) procedure, which alternatively scales row and column sums 
to all equal 1. The dominant entries in the bistochasticized table can then be employed for network 
reduction, using strong component hierarchical clustering. We illustrate various facets of this well- 
established, widely-applied two-stage algorithm with the 3, 107 x 3, 107 (asymmetric) 1995-2000 
intercounty migration flow table for the United States. We compare the results obtained with ones 
using the disparity filter, for "extracting the "multiscale backbone of complex weighted networks", 
recently put forth by Serrano, Boguha and Vespignani (SBV) (Proc. Natl. Acad. Sci. 106 [2009], 
6483), upon which we have briefly commented (Proc. Natl. Acad. Sci. 106 [2009], E66). The 
performance of the bistochastic filter appears to be superior, in this specific case, in two respects: 
(1) it requires far fewer links to complete a strongly-connected network backbone; and (2) it "belit- 
tles" small flows and nodes less-a principal desideratum of SBV-in the sense that the correlations 
of the nonzero raw flows are considerably weaker with the corresponding bistochastized links than 
with the significance levels yielded by the disparity filter. Further, the disparity filter, in general, 
relies upon a somewhat arbitrary choice of either AND or OR rules, while the bistochastic filter 
does not. Additional comparative studies-as called for by SBV-of these two filtering procedures, 
in particular as regards their topological properties, should be of considerable interest. Relatedly, 
in its many geographic applications, the two-stage procedure has-with rare exceptions-clustered 
contiguous areas, often reconstructing traditional regions (islands, for example), even though no 
contiguity constraints, at all, are imposed beforehand. 

PACS numbers: Valid PACS 02.10.Ox, 02.10.Yn, 89.65. Cd, 89.75.Hc 
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I. INTRODUCTION 
A. Disparity filter 

The abstract for a recent paper [1] in the Proceedings of the National Academy of Sciences 
by Serrano, Boguna and Vespignani (SBV) entitled, "Extracting the multiscale backbone 
of complex weighted networks," reads-in part-as follows: "In recent years, the study of an 
increasing number of large scale networks has highlighted the statistical heterogeneity of 
their interaction pattern, with degree and weight distributions which vary over many orders 
of magnitude. These features, along with the large number of elements and links, make the 
extraction of the truly relevant connections forming the network's backbone a very challeng- 
ing problem. More specifically, coarse-graining approaches and filtering techniques are at 
struggle with the multiscale nature of large scale systems. Here we define a filtering method 
that offers a practical procedure to extract the relevant connection backbone in complex 
multiscale networks, preserving the edges that represent statistical significant deviations 
with respect to a null model for the local assignment of weights to edges. An important 
aspect of the method is that it does not belittle (emphasis added) small-scale interactions 
and operates at all scales defined by the weight distribution." 

The disparity filter employed by SBV, in the general weighted, directed case, takes the 



Here a is a preassigned significance level, k tn and k out denote the in-degree and out-degree 
"of the node to which the directed link under consideration is attached", and p*™ and p,° ni 
indicate the associated transition probabilities (normalized weights). One can employ an 
AND rule or an OR rule on the pair {a-", for testing the significance of the r/'-link, and 
thus deciding whether or not to admit it into the backbone. (SBV expressed a preference for 
the application of the OR rule "because it ensures that small nodes in terms of strength are 
not belittlefd]" [TJ SI, p. 3].) For comparative purposes, SBV also apply a global threshold 
filter, which destroys "the multiscale nature of more realistic networks where weights are 



form [H eqs. (8), (9) in SI], 
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locally correlated on edges incident to the same node and nontrivially coupled to topology" 
[H p. 6483]. 

The null hypothesis underlying the use of the disparity filter is that the incoming (out- 
going) connections of a node are produced by a uniform random assignment. In [2] the 
disparity filter has been applied by SBV to the world trade web to find "dominant trade 
channels" (cf. [3H6]). While the disparity filter is based on significance-testing, the bis- 
tochastic filter with which it is to be compared here, and discussed immediately next, relies 
upon an estimation procedure [7j. 

B. Bistochastic filter 

The underlying motivations of SBV in devising the disparity filtering methodology appear 
to be somewhat similar to those for a certain two-stage algorithm the use of which was first 
reported in 1975 P4TU]. Over the succeeding decade, this methodology was widely applied 
to internal migration flows between the geographic subdivisions of numerous nations and 
other forms of "transaction flows" (see the extensive bibiliography in [H]). Many of these 
applications were collected in the 1984 research institute survey monograph [12J. (In a 
review of [T2] , R. C. Dubes wrote that the two-stage methodology "might very well be the 
most successful application of cluster analysis" [32].) 

SBV remark that "Reduction schemes can be divided into two main categories: coarse- 
graining and filtering/ pruning" . The two-stage procedure can easily seen to fulfill a role in 
both categories. 

1. First stage 

In the first stage (iterative proportional fitting procedure [IPFP] [14J), the rows and 
columns of the table of flows (fij) are alternately ("biproportionally" [15] ) scaled to sum to 
a fixed number (say 1). Under broad conditions-to be discussed below-convergence occurs 
to a "doubly-stochastic" (bistochastic) table, with row and column sums all simultaneously 
equal to 1 [161420] . The purpose of the scaling is to remove overall (marginal) effects of 
size, and focus on relative, interaction effects. Nevertheless, the cross-product ratios (relative 
odds), ^ 3 { k \ , measures of association [7], are left invariant. Additionally, the entries of the 
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doubly-stochastic table provide maximum entropy estimates of the original flows, given the 
row and column constraints [2D [22] • Further, the ij -entry of the bistochastic table can be 
written as the product of the raw flow and a row multiplier (rj) and a column multiplier 



For large sparse flow tables, only the nonzero entries, together with their row and col- 
umn coordinates are needed for the IPFP. Row and column (biproportional) multipliers 
(Vj,Cj) can be iteratively computed by sequentially accessing the nonzero cells [23J. If the 
table is "critically sparse" , various convergence difficulties may occur. Nonzero entries that 
are "unsupported" -that is, not part of a set of N nonzero entries, no two in the same 
row and column- may converge to zero and/or the biproportional multipliers may not con- 
verge p21 P- 19] [21] [251 p. 171]. The "first strongly polynomial-time algorithm for 
matrix scaling" was reported in [26]. (Smoothing procedures could be used to modify the 
zero-nonzero structure of a flow table-treating the zeros as due to sampling, rather than 
structural effects-particularly if the table is critically sparse [2H I2E] ■ If one takes the sec- 
ond power of a doubly-stochastic matrix, one obtains another such matrix-of predicted 
two-step movements-but smoother in character. One might also consider standardizing the 
ith row [column] sum to be proportional to the number of non-zero entries in the ith row 
[column]-although we found considerable numerical difficulties when attempting this, using 
the methodology developed in [26J- for the 1995-2000 U. S. intercounty migration table ana- 
lyzed below. Another procedure-in line with the Google page-ranking ["teleporting random 
walk"] procedure [2SH20], that has been much studied and emulated-is to take some convex 
combination of the doubly-stochastic table and the N x N table all the off-diagonal entries 
of which are equal to ttj.) 

2. Second stage 

In the second stage of the two-stage procedure, the doubly-stochastic matrix is converted 
to a series of directed (0,1) graphs (digraphs), by applying thresholds to its entries. As 
the thresholds are progressively lowered, larger and larger strong components (a directed 
path existing from any member of a component to any other) of the resulting graphs are 
found. This process (a simple variant of well-known single-linkage [nearest-neighbor or 
min] clustering [31]) can be represented by the familiar dendrogram or tree diagram used 
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in hierarchical cluster analysis and cladistics/phylogeny (cf. [321 [33]). (The "CLASSIC" 
methodology proposed by Ozawa-though couched in rather different terminology-appears 
to be fully equivalent. Ozawa found the procedure to be useful in "the detection of gestalt 
clusters" [32].) 

3. Computer implementation 

A FORTRAN implementation of the two-stage process was given in [31] (and extensively 
applied in [35]), as well as a realization in the SAS (Statistical Analysis System) framework 
[36] . Subsequently, the noted computer scientist (1982 Nevanlinna medalist) R. E. Tarjan 
[37] devised an 0(M(\ogN) 2 ) algorithm [38] for strong component hierarchical clustering, 
and, then, a further improved 0(M (log N)) method [SH], where N is the number of nodes and 
M the number of edges of a directed graph. (These substantially improved upon the earlier 
works [311 EE] , which required the computations of transitive closures of graphs-in terms of 
which the analysis of Ozawa [32] is phrased-and were O(MN) in nature.) A FORTRAN 
coding-involving linked lists-of the improved Tarjan algorithm |39j was presented in |40j, 
and applied in the 1965-70 US intercounty study [41 J. If the graph-theoretic (0,l)-structure 
of a network under study is not strongly connected [12], independent two-stage analyses of 
the subsystems of the network would be appropriate. (So, it is interesting to note that there 
does exist some form of mathematical relationship between the two quite distinct stages of 
the two-stage algorithm.) 

4- Further background 

In the recent spate of activity and interest in the science of networks over the previous 
decade or so, the two-stage algorithm has been little applied nor analyzed, it seems. Neither, 
does it appear to have been re-invented. (Perhaps of relevance in this regard is that a number 
of applications of the two-stage algorithm and associated comments/discussions appeared in 
the journal IEEE Transactions on Systems, Man and Cybernetics, and that much attention 
has shifted over the years from " systems analysis" to " network analysis" . ) 

In 2008, our attention was redrawn-after a long hiatus-to this general area of research, 
by the preparation of a review [13] of "A Beautiful Mind: John Nash, game theory, and 
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the modern quest for a code of nature" [33J- We, then, posted papers in which we sought 
to bring the interesting properties and many applications of the two-stage algorithm to the 
attention of network analysts [TTJ [35] • Most significantly, we have had a letter published 
[35] . commenting on the recent SBV paper [I], elaborated upon above, to which SBV have 
responded [37] . The issues arising in this interchange form the basis for much of this study 



C. Outline of study and findings 

In the present study, we illustrate-in a new, recently conducted analysis-the use and 
properties of the two-stage algorithm, employing the large-scale example of migration be- 
tween the more than three thousand county-level units of the United States (sec. |n]) (cf. 
for analyses of U. S. intercounty flows of dollar bills, using the exceptional "Where's 



George?" database). In a discussion section (sec. Ill) we further comment on issues arising 
in the exchange of letters with SBV and its pertinence to network analysis, conduct analyses 
of a similar nature to theirs, for comparative purposes (as called for by SBV in [37]), and 



outline previous results obtained using the two-stage algorithm. In sec. [IV] we present a 
summary of our findings here. 

Our principal finding of a comparative nature is that the bistochastic filter appears, in the 
example at hand, to outperform the disparity filter in, at least, two significant features: (1) 
the number (25,329) of bistochastic links needed to generate a strongly-connected backbone 
is far fewer than the number (lying somewhere within the range 80,204 to 83,692) required 
by the disparity filter (using the OR rule, preferred "because it ensures that small nodes in 
terms of strength are not belittle [d]" [U p. S3]); and (2) the correlation of the logarithms of 
the 735,531 nonzero migration flows with the corresponding logarithms of bistochasticized 
values is considerably weaker than with the significance levels yielded by the disparity filter 
(the same form of conclusion holding without taking the logarithms, with all the pertinent 
correlations, however, being somewhat weaker in nature)-thus, "belittling" small flows less, 
a principal desideratum of SBV. 
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II. TWO-STAGE ANALYSIS OF U. S. INTERCOUNTY MIGRATION TABLE 



A. Matrix of Intercounty Flows 

Based upon a question as to responders' 1995 residences posed in the 2000 United States 
Decennial Census, one can construct a square (origin-destination) matrix of 1995-2000 mi- 
gration flows (rriij) between 3,107 county-level units of the nation. Many nations in their 
censuses ask similar questions as to previous residence. The results are often reported in the 
form of "internal migration tables" -which have served as a basis for most of the two-stage 
multinational analyses reported in [12]. A referee, however, has asserted that "According 
to UNESCO the variations existing between countries indicate that there are no objective 
definitions of migration". This may, in part, reflect difficulties, arbitrariness in aggregating 
data into administrative units for the purpose of table compilation. For a very comprehen- 
sive multi-author review entitled: " Cross-National Comparison of Internal Migration: Issues 
and Measures" , see [50] . ) 

In Fig. [TJ we show a matrix plot of this (raw data) table. (In the absence of any further 
relevant information, we set to zero the diagonal entries-which conceptually might corre- 
spond either to the number of people who actually moved within the county or who simply 
stayed within it (cf. [51]).) In the principal, admininstrative sorting of the rows/columns 
of the table, the fifty states are ordered alphabetically, while, secondarily, within the states, 
their constituent counties are ordered also alphabetically. 

We immediately discern a clear clustering along the diagonal in Fig. [TJ indicative of the 
obvious proposition that migrants have a proclivity to move intrastate-wise, both for simple 
distance and state loyalty/ties/allegiance considerations. However, the alphabetical ordering 
by states is certainly highly fortuitous in character, and we observe relatively heavy migration 
far removed from the diagonal (say for the physically contiguous, but alphabetically non- 
proximate pairs [California, Oregon] and [Lousiana, Texas].) (Historically, the design and 
layout of counties differ considerably-somewhat unfortunately from a geographic-theoretic 
point of view-between states, and we note that Texas has the most counties, 254, and 
appears as a large square far down the diagonal in Fig. [TJ while the state of Georgia, with 
the second most counties, 159, is also apparent near the upper left corner. In these and 
subsequent matrix plots, zero values are displayed as white, with negative values tending to 
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FIG. 1: Unadjusted (raw) 1995-2000 intercounty U. S. migration table. The states are first ordered 
alphabetically, and then the counties alphabetically within the states. The large square near the 
end of the diagonal corresponds to the state with the most (254) counties, Texas, while Georgia, 
with 159 counties, is located near the beginning. Counties 1000 and 2000-which Mathematica 
chooses to locate-are Boyd County, Kentucky and Dunn County, North Dakota, respectfully. 



be bluish and positive values reddish.) 



1. Multiscale character of U. S. intercounty migration flows 

In Fig. [2j we jointly plot the county number (1 to 3,107) in the administrative order 
employed in the two previous figures, along with the out-degrees and the in-degrees (that 
is, the number of counties receiving and sending migrants to and from a specific county), 
and in Fig. |3j we analogously employ the total in- and out-migrants for each county. (The 
correlation between in- and out-data for Fig. [2] is 0.965233 and 0.923365 for Fig. [3j The 
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FIG. 2: Joint plot of administrative county number and in- and out-degrees of the 3,107 counties, 
that is, the number of non-zero entries in rows and columns of the intercounty migration table. 
The ordering of counties is the same as in the first two figures. 

largest in-degrees are for Los Angeles [2,371], Maricopa [Phoenix, 2,259] and San Diego 
[2,243] Counties, respectively, while the largest out-degrees are for Maricopa [2,012], San 
Diego [1,853] and Los Angeles [1,587], respectively. In the administrative numbering system, 
Los Angeles is county #203, Maricopa, #102 and San Diego, #2243.) 

B. First stage bistochasticization of migration table 

Counties vary widely in their number of in- and out-migrants (Fig. [3|. To control for 
this (marginal/multiscale) effect, one may biproportionally/iteratively adjust the row and 
column sums (the " Sinkhorn-Knopp algorithm" [52]) so that all 6,214 = 2 x 3,107 such 
sums converge to be equal (say to 1). (This algorithm provides the basis of a measure- 
alternative to the PageRank employed by Google-of web page significance [52].) In Fig. [ij 
we show the 3, 107 x 3, 107 intercounty migration table after such a bistochasticization 
(double-standardization). Clearly, the underlying definition/delimitation of blocks has been 
heightened by this transformation. The purpose of the scaling is to remove overall effects of 
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FIG. 3: Joint plot of adminstrative county number and the 1995-2000 in-migrant and out-migrant 
totals for each of the 3,107 counties. The largest in-flows are to Los Angeles, California, Chicago 
[Cook County], Illinois and Houston [Harris County], Texas. The ordering of counties is the same 
as in the first two figures. 

size (which certainly may be of interest in themselves [Fig. [3]), and focus on the usually more 
intricate relative, interaction effects. Nevertheless, the cross-product ratios (relative odds), 
m ' jmfc ' , measures of association, are left invariant in the process. Additionally, interestingly, 
the entries of the doubly-stochastic table provide maximum entropy estimates of the original 
flows, given the constraints on the row and column sums [2Tj [22]. So, this corresponds to 
an idealized situation in which all counties were constrained to emit and receive the same 
numbers of migrants. 



1. Eigenanalyses of bistochastic table 

The dominant left and right eigenvectors (corresponding to the eigenvalue 1) of the 
doubly-standardized table are simply uniform vectors. The subdominant (left and right) 
eigenvectors (corresponding to a real eigenvalue of 0.906253) are of interest [53]. (The cor- 
relation between these two eigenvectors is high, 0.971197. The third largest eigenvalue is 
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FIG. 4: Doubly-stochastic form of the 1995-2000 intercounty U. S. migration table. Fuzziness in 
Fig.[T]is greatly reduced. The correlation between the 735,531 non-zero entries of both these forms 
of the table is 0.157125, which can be increased to 0.279051 by taking the logs of the raw flows. 



real also, 0.868784, while the fourth is slightly complex in nature, 0.84562 + 0.000906373?. 
The vector of 3,107 eigenvalues has length 12.6472.) We reorder or seriate Fig. [4] on the 
basis of the left (in-migration) eigenvector and obtain Fig. [5j and on the basis of the right 
(out-migration) eigenvector and obtain Fig. |6} Now we see diminished clustering far from 
the diagonal. Further, both of these figures suggest the division of the nation into basically 
two large clusters. 



C. Second stage-strong component hierarchical clustering (SCHC) 



Further, reordering on the basis of the ( 3 8-p age-long, 3,107-county) dendrogram ( [Til 
Supplement] [54]) generated by the strong component hierarchical clustering (the directed- 
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-0.00868334 0.00759781 0.00207526 0.0551071 1. 



0.0467724 



11 -0.0788444 -0.0202812 -0.000659818 0.0206225 0.0467724 1. 

TABLE I: Correlations between the orderings of counties used in the several numbered correspond- 
ing figures. Correlations greater than 0.0676788 in absolute value are significant at the 99.99% level, 
those greater than 0.0458262 at the 99% level, and 0.0353074 at the 95% significance level. 

graph analogue of the single-linkage method) [121 E21 EH1 EHl ESHSD] of the bistochasticized 
table, we obtain Fig. [7J (To obtain the indicated lengthy, computable searchable dendrogram 
from [H] for viewing, along with associated other [cardinal and ordinal] tree diagrams, one 
must download the source file from the http:/arxiv.org website and "tar -xvf" it.) The 
correlation between the ordering used in Fig. 7 and the admininstrative ordering used in 
Figs. 1 and 2 is 0.0373522, and between the ordering used in Fig. 7 and the orderings 
used in Figs. 5 and 6, respectively, even lower, 0.00401504 and 0.0099957 (Table g. (The 
corresponding correlations between the administrative ordering and the seriations employed 
in Figs. 5 and 6 are 0.0579257 and 0.0755089. Correlations greater in absolute value than 
0.0353074 are significant at the 95% level, 0.0400655 at the 97.5% level, and 0.0458262 at 
the 99% level.) 

The dominant feature of Fig. [7] is that the counties now listed at the beginning in the 
reordering-and, in general, the last to be absorbed in the agglomerative clustering process- 
are "cosmopolitan" or "hub-like" . They tend to receive and send migrants across the nation, 
while those nearer to the end in the reordering tend to be more provincial or limited in their 
breadth of interactions [56J . (A prototypical example of a hub-like internal migration area is 
Paris [561 [61]. I n analytically parallel studies of interjournal citations [571 E21 E2] , one might 
anticipate that the broad journals, Science, Nature and the Proceedings of the National 
Academy of Sciences might fulfill analogous roles.) This appears to be an interesting feature 
of the two-stage algorithm specific to it. 
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1. Ultrametric fit and residuals 



The ultrametric fit to this reordered bistochasticized table provided by the strong com- 
ponent hierarchical clustering [T21 E21 ESI ESI E5HHD] is given in Fig. |8j and the residuals 
(predominantly negative) from the hierarchical fit in Fig. [9j (These latter two figures, both 
in their own ways, further reflect this cosmopolitan-provincial dichotomy between the U. S. 
counties.) 



2. Use of the Direct Agglomerate command of Mathematica 



In Fig. 10 we display the bistochastic form of the 1995-2000 U. S. intercounty migration 



table now reordered on the basis of the hierarchical clustering generated by application of 
the DirectAgglomerate command of Mathematica. (We inputted our asymmetric values- 
converted to dissimilarity measures-even though the command assumes a symmetric input. 
We also applied the same command to the transpose of the dissimilarity matrix, and obtained 



somewhat differing results [Fig. [TT].) The correlation between the orderings in Fig. 10 and 
Fig. 11 is 0.0467724, and that of the ordering in Fig. [7] with those in Figs. 10 and [TT], 
0.0551071 and 0.0206225, respectively. (With the administrative ordering used in Figs.JTJand 



4j the correlations with Figs. 10 and 11 are negative, -0.00868334 and [negatively significant] 
-0.078844, respectively.) 



III. DISCUSSION AND FURTHER ANALYSES 



Much earlier (JU ED] than this current paper, we had also studied (but without the 
aid of the more recently-developed computerized matrix plots used above) bistochasticized 
forms of the 1965-70 U. S. intercounty migration table with strong component hierarchical 
clustering [12j E21 [38j 1391 155H60] . both with zero and non-zero (corresponding to intracounty 
movements) diagonal entries. Counties with large physical areas tend to absorb more of 
their own migrants, and thus exhibit larger diagonal bistochasticized entries and smaller 
off-diagonal entries in the non-zero-diagonal analysis, making them link at weaker levels in 
the dendrogram generated in the zero-diagonal analysis (cf. [51"]). 

Journals with high self-citations would be expected to behave analogously in journal 
citation-matrix analyses [571 E21 [61] . (In the application of our two-stage bistochasticization 
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FIG. 5: Doubly-stochastic matrix (Fig. [4]) reordered on the basis of its sub dominant left eigenvector. 
The first 72 counties in the ordering are all from Georgia (mostly lying in a ["Upper Coastal Plain"] 
band from the southwest corner of the state [Seminole County] to its north central boundary 
[Franklin, Hart, Elbert and Lincoln Counties]), and the last 110, all from the Great Plains states of 
North Dakota (45), South Dakota (50) and (north central) Nebraska (15). County 1000 is Bucks 
County, Pennsylvania and 2000, Lubbock County, Texas. 

and strong component hierarchical clustering procedure to the 1967-75 interjournal cita- 
tions between twenty-two mathematical journals, the Proceedings of the American Mathe- 
matical Society was found to function in a particularly broad, cosmopolitan manner in a 
zero-diagonal analysis [57], while Advances in Mathematics played an analogous role when 
diagonal entries were taken into account.) 
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FIG. 6: Doubly-stochastic matrix (Fig. [4| reordered on the basis of its subdominant right eigen- 
vector. Rather similarly to Fig. [5j the first 74 counties in the ordering are all from Georgia, and 
the last 181, all from North Dakota, South Dakota, Nebraska and Minnesota. County 1000 is 
Washington County, Louisiana and 2000, Adair County, Oklahoma. 

A. Comparisons of bistochastic and disparity filters 

In their response [U] to the letter [46 J commenting on their article PQ, Serrano, Boguna 
and Vespignani (SBV) have called for "an in-depth analysis of Slater's [two-stage] technique 
on a set of standard multiscale networks and a thorough comparison of the results with 
respect to ours and other methods, as we have done in our paper [[I]]". Of course, this is 
a most appropriate proposal, which we now pursue here. The SBV methodology appears 
capable of producing a hierarchy of nodes, so direct comparisons should be possible [65J. 
Additionally, we can choose to take as an obvious candidate-initially, at least-for the mul- 
tiscale backbone, the 25,329 links used in our intercounty migration two-stage analysis to 
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FIG. 7: Doubly-stochastic matrix (Fig. [4]) reordered on the basis of its strong component hier- 
archical clustering. The first twelve ("cosmopolitan") counties in the seriation are all from the 
"Sunbelt" states of Florida (5 counties, a well-defined cluster of four of them being equivalent 
to the Tampa-St. Petersburg-Clearwater Metropolitan Statistical Area), Arizona (2), (southern) 
California (3), Nevada (Las Vegas) (1) and Texas (Dallas) (1). The last 35 ("provincial") ones-lie 
principally in the "Black Belt", stretching through the Deep South states of Mississippi (5), Al- 
abama (24), Georgia (4) and (Panhandle) Florida (2). County 1000 is Carroll County, Indiana and 
2000, Warren County, New Jersey. 

complete the strong component hierarchical clustering (SCHC). (It would be of interest to 
overlay this backbone-both in raw and bistochastic form-on a county map of the United 
States [cf. [HI Figs. 1-4]]. If we apply the SCHC to the largest raw migration flows, rather 
than to their bistochastic counterparts, it requires more than half-a-million, as opposed to 
25,329 links, to complete the process.) This can be compared with backbones generated 
by the techniques of SBV. Of course, by choosing thresholds one can truncate links in the 
SCHC backbone with smaller bistochastic values to include precisely any specific number of 
links one a priori desires in the backbone ultimately selected. 
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FIG. 8: Ultrametric (strong component hierarchical clustering [SCHC]) fit to the doubly-stochastic 
matrix Fig. [7| The fits tend to be higher in the lower right-hand corner, corresponding to the more 
"provincial" (including "Black Belt") counties. 
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FIG. 9: Residuals (predominantly negative) of the ultrametric fit (Fig. [8]) to the doubly- stochastic 
matrix (Fig. [7]). The residuals are most negative in the lower right-hand corner, where the fits 
provided by the strong component hierarchical clustering [SCHC] were highest. 
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FIG. 10: Doubly-stochastic matrix (Fig. [4]) reordered using the hierarchical clustering generated 
by the DirectAgglomerate command of Mathematica-the only option in the Mathematica hierar- 
chical clustering package that seemed computationally feasible. The first thirteen counties in the 
reordering are from Florida (10), Hawaii (1-Kalawao, the smallest U. S. county) and Texas (2), 
while the last twenty-one are from Alabama (6), Georgia (10) and Florida (5). County 1000 is 
Rusk County, Wisconsin and 2000, Knott County, Kentucky. 

1. Correlations and cumulative plots 



Let us here note that the correlation between the 735,531 nonzero bistochastic values of 
the 3, 107 x 3, 107 intercounty migration table and the corresponding signficance levels (ctij) 
of SBV is -0.331835, using an OR rule, that is taking as the second variable Min[a*", a™*] 
and -0.420942, with the use of an AND rule, that is taking Max[a-™, a°"*]. Of course, we 
expect the correlations to be negative, since smaller a's indicate greater significance. We 
can strengthen the two correlations to -0.586303 and -0.640896, respectively, by using the 
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FIG. 11: Doubly-stochastic matrix (Fig. Q reordered using the hierarchical clustering generated 
by the DirectAgglomerate command of Mathematica applied to the transpose of the dissimilarity 
matrix. The five counties of Hawaii are clustered near the beginning. The last thirty-seven counties 
belong to either Alabama or Mississippi. County 1000 is Sciotto County, Ohio and 2000, Polk 
County, Nebraska. 

logarithms of the bistochastic values, rather than the bistochastic values themselves. 

The (notably strong) correlations between the logs of the 735,531 nonzero raw (un- 
adjusted) flows and the corresponding values of Min[a*™, a?- 4 *] is -0.886816 and with 
Max[a*™, a™'], -0.849757. Thus, large raw flows certainly tend to be highly significant in 
the disparity filter model. 

In the spirit of the analysis of SB V [1] , and in response to their call (17] for further testing, 



we present Fig. 12 On the vertical axis-as a measure of total explanation-we plot the cu- 
mulative proportion (reaching 0.550295) of bistochastic flows (red curve) and the cumulative 
proportion (reaching 0.316269) of the corresponding raw (unadjusted) migration flows (blue) 
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FIG. 12: Cumulative proportions of decreasingly ordered bistochastic flows (red curve) and corre- 
sponding (non-ordered) raw flows (blue curve). The largest 25,329 bistochastic flows are needed to 
complete the strong component hierarchical clustering and can be considered to form the multiscale 
backbone of the network. 

curve, as functions of the decreasingly ordered (from 1 to 0.0225427) largest 25,329 links 
in the bistochastic table. (The correlation between the largest 25,329 bistochastic values 
and the correponding raw [flow] values is 0.0451904, while the analogous correlation for the 
735,531 non-zero raw and bistochastic flows is larger, 0.157125. These can be increased to 
0.26249 and 0.279051, respectively, by using the logarithms of the raw flows, and further still 
to 0.318379 and 0.408434, respectively, by taking the logs of both the raw and bistochastic 
variables. Thus, large bistochastic values do exhibit a tendency to be associated with large 
raw values-but not as much as do the disparity filter significance levels.) 



Further, in Fig. 13, we show the evolution of the SCHC agglomerative clustering process. 
As edges associated with smaller bistochastic values are introduced into the initially edge-less 
digraph, more previously isolated nodes are incorporated into nontrivial strong components, 
until with the 25,329th edge, associated with a bistochastic value of 0.0225427 (for the flow 
from Indian River County, Florida to Brevard County, Florida), all 3,107 nodes (counties) 
are joined together to complete the clustering process (as well as forming a candidate for 
the multiscale backbone of the migration network). 



We have constructed a comparable figure to Fig. 13 based on the disparity filter of SBV 
[H eq. (8), SI], using an OR rule on the pair of in-flow and out-flow significance levels, 
a™, a™*. (We subtract the results from 1, in order to more directly graphically compare 
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FIG. 13: Evolution of the strong component hierarchical clustering (SCHC) process. As edges 
with decreasing bistochstic values (z-axis) are introduced into the initially edge-less digraph, pre- 
viously isolated nodes are incorporated into the multiscale backbone of the migration network, 
until eventually [with the 25,329th largest link] all nodes are joined together. 



results based on the bistochastic values.) In Fig. 14, we show the evolution of the backbone 
as the significance level is raised. 



2. Disparity filter-OR rule with significance level a = 0.01 

Employing the OR rule on the migration links with a significance level of a = 0.01, the 
number of flows (edges) passing the test was 32,294 and the number of strong components in 
the associated candidate multiscale backbone was 67, with the backbone having 59.0179% 
of the total edge weights (that is, the total number of migrants-47,240,477-recorded in the 
raw data table), a "respectable" percentage. There was one giant component with 3,040 
counties (cf. [66], [67]), 65 isolated counties and one pair, Lipscomb and Ochiltree Counties, 
Texas (previously encountered with the two-stage algorithm). Again, the isolated (singleton) 
counties (none with in- or out-degree exceeding 115) were inland ones, not particularly 
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FIG. 14: Evolution of the disparity filter backbone of Serrano, Boguha and Vespignani [lj, using OR 
rule. As edges with decreasing values of 1 — Min[c^™, a™'] (plotted along the z-axis) are introduced 
into the initially edge-less digraph, previously isolated nodes are incorporated into nontrivial strong 
components forming the multiscale backbone of the migration network. For each of the 51,900 links 
employed, Min[a-™, a°"*] < 0.05. The use of these links still leaves 15 individually isolated nodes 
(counties) (six neighboring ones from north central Nebraska), unincorporated into the backbone. 

notable as migration origins or destinations. 

3. Disparity filter-OR rule with significance levels a = 0.14 and 0.13 

With the OR rule and a much weaker significance level, a = 0.14, there are 83,693 
accepted edges, and all nodes lie in one strong component, and 73.0026% of edge weights 
is included. (We note that 83, 693 > 25, 329, the number of edges needed in the two-stage 
analysis [Tl]. For a = 0.13, there are 80,203 accepted edges and two strong components, 
with sparsely-populated [belittled?] King County, Texas, serving as a singleton.) 
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4- Disparity filter-AND rule with significance level a = 0.05 

The use of the disparity filter, using a significance level a = 0.05 on the 3, 107 x 3, 107 
raw migration table, together with an AND rule (that is, a link must pass the significance 
test, viewed as both an inflow and an outflow) yielded 25,351 links-extremely close to the 
25,329 links needed to complete the SCHC. However, with the slightly larger number of links 
obtained with the disparity filter, there were 181 distinct strong components (as opposed to 
only one with the application of the two-stage procedure). Of them, 174 were simply isolated 
nodes, and one "giant" one consisting of 2,836 counties. This left six doublets (each pair 
comprised of contiguous counties): (1) the Georgia counties of Lincoln and Wilkes; (2) the 
Georgia counties of Stewart and Webster; (3) the California counties of Inyo and Mono; (4) 
the Nebraska counties of Nuckolls and Thayer; (5) the Kansas counties of Phillips and Smith; 
and (6) the Texas counties of Ochiltree and Lipscomb. (The greatest in- or out-degree-that 
is, the number of other counties to which migrants were sent or received-for any of these 
twelve counties was 146. With the exception of the first and fourth pairs listed, the same 
doublets were obtained in the two-stage analysis |llj.) All of the 174 isolated counties were 
located away from the Atlantic and Pacific coasts, with only one from Florida and none 
from California or Arizona. (The greatest in- or out-degree for any of these 174 counties was 
132.) So, there does not seem to be any "Sunbelt" or "cosmopolitan" effect at work here. 

5. Disparity filter-AND rule with significance level a = 0.001 

Using the AND rule with a = 0.001, the resultant backbone has 10,153 links and 525 
strong components, and 42.4254% of the total edge weights. The largest component consists 
of 2,045 counties, while the next two largest are formed by 17 counties of the state of 
Montana, and 6 contiguous counties of eastern Nebraska. There are also two quintets (one 
comprised of Mississippi counties, and one of Kansas and Oklahoma counties) and four 
quartets (formed by Arkansas, Georgia, North Carolina and Texas counties). 
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B. Discussion 



1. Asymmetries 

It would seem of interest and relevance to the study here of comparative properties of 
the two filtering procedures to also address a number of points raised in the response of 
SBV [UJ to the letter |46j, commenting on their original (disparity filter) study pQ. SBV 
remark that the two-state algorithm can generate "spurious asymmetries when the original 
network is symmetric." (One can initiate the iterative proportional fitting procedure used to 
convert the flow matrix to bistochastic form by first normalizing rows or by first normalizing 
columns. However, we claim, this should not introduce significant asymmetries in the end 
result if suitable convergence is obtained.) 

2. Global/local issues 

The use of "globally" in the statement in [UJ that "individual weights in the original 
matrix are globally normalized so that they can be compared on an equal footing" ap- 
pears to suggest that the original flow matrix is simply scaled by a single number in the 
bistochastization-which is certainly not the case. (SBV assert that their methodology is 
more "local" in nature than the two-stage procedure.) In this regard, let us observe that if 
one doubles, say, the entries in a single row or column of the flow matrix, then the results 
of both the two-stage algorithm and the disparity filter are completely invariant. 

However, it is true that the decision whether or not to admit the ij-link into the network 
backbone depends only upon the entries in the z-th row and/or j-th column in the disparity 
filter, while this is certainly not the case with the bistochastic filter. 

3. Minimal spanning trees 

In [U p. 6484], SBV assert that "reduced networks obtained [using the minimal spanning 
tree (MST)] are overly structural simplifications that destroy local cycles, clustering coeffi- 
cient [s], and the clustering hierarchies often present in real world networks". The MST is the 
basis for the method of single-linkage clustering. Further, the strong component hierarchical 
clustering (SCHC) procedure [39], we have widely applied (serving as the second-stage of the 
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two-stage algorithm), can be viewed as the extension of single-linkage clustering to weighted, 
directed graphs. Therefore, the remark of SBV could be thought also to extend there. How- 
ever, we think that this general criticism is quite easily and naturally addressed, if one 
supplements the specific links in the MST, using all those links having greater weight than 
the minimal one employed in the MST, rather than simply those links present in the MST. 
(If the insertion of a link does not succeed in joining hitherto disconnected components, it 
is not included in the MST, no matter how large its value.) 

One additional, interesting observation to make is that with an undirected graph, and in 
the absence of links with precisely equal values, construction of the MST always unites just 
two connected components. In the strong component/directed graph analogue of the MST, 
the insertion of a new link can, in fact, even in the absence of links with identically equal 
values, join more than two strongly connected components." 

4- Null models 

SBV state that "there is not a clear proposal for a suitable null model to measure the 
statistical significance of the results" [of the two-stage algorithm] [37]. (For corroboration, 
they refer to an early 1976 article [HH]- However, later, in [TTJ, we did apply a graph-theoretic 
isolation criterion, we had also employed in 1981 and 1983, as well [4Tj ED], to rank clusters 
[regions] in terms of their statistical significance (cf. [681 sec - IV.3]).) 

By way of illustration, in the 1965-70 US 3,140-county migration study, a statistical test 
of Ling [69J (designed for undirected graphs), based on the difference in the ranks of two 
edges, was employed in a heuristic manner [4TJ pp. 7-8]. For example, the 3,148th largest 
doubly-stochastic value, 0.12972 (corresponding to the flow from Maui County to Hawaii 
County), united the four counties of the state of Hawaii. The (considerably weaker) 7,939th 
largest value, 0.07340 (the link from Kauai County, Hawaii, to Nome, Alaska), integrated 
the four-county state of Hawaii into a much larger 2,464-county cluster. The difference of 
these two ranks, 4,192 = 7,340 - 3,148, is a measure of isolation ("survival time") of this 
state as a cluster. Reference to Table 1 in [41] showed the significance of the state of Hawaii 
as a functional internal migration unit at the 0.01 level [4"Tl p. 7]. (In the computation of 
this table, the approximation was used that the number of edges in the relevant digraphs 
was a negligible proportion of all possible 3, 140 x 3, 139 edges.) 
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In these regards, it would also seem natural to investigate exploiting the notion of a 
"random doubly-stochastic matrix" [7U1 [7T]. (Potentially useful then would be the seminal 
result of Birkhoff that any n x n doubly-stochastic matrix can be written as a convex 
combination of at most n 2 permutation matrices-those with a single 1 in each row and 
column, and zeros elsewhere.) 

5. Properties of bistochastic matrices 

Let us further make the general observations that powers of bistochastic matrices are also 
bistochastic, and that mathematical physicists have been interested in developing conditions 
that indicate when a bistochastic matrix is also unistochastic [TU [7TH73] . (This is the 
case if the r/'-bistochastic entry is the square of the absolute value of the r/'-entry of a 
unitary matrix.) It would be interesting to investigate whether or not unistochasticity is of 
value in the modeling of network flows.) An efficient algorithm-considered as a nonlinear 
dynamical system-to generate random bistochastic matrices has recently been presented [TT] 
(cf. [70j EI])- (Gudder has quite recently developed the concept of a bistochastic transition 
effect matrix [73].) 

6. Cosmopolitan/ provincial dichotomy 

Although, by no means, have we yet systematically compared the clustering structures 
produced by the bistochastic and disparity filter approaches in our 1995-2000 migration anal- 
ysis, the two methods do seem to yield rather different (but still largely contiguous) results 
(regions). One distinguishing, highly attractive feature of the bistochastic approach has been 
its ability to contrast "cosmopolitan" (hub-like or centralized) units from "provincial/local" 
ones. We are not aware of any comparable feature with the disparity filter. 

Geographic subdivisions (or groups of subdivisions) that enter into the bulk of the den- 
drograms produced by the two-stage procedure at the weakest levels are those with the 
broadest ties. These are "cosmopolitan" , hub-like prototypical example being the 

French capital, Paris [121 Sec. 4.1] [56]. Similarly, in parallel analyses of other internal 
migration tables, the cosmopolitan/non-provincial natures of London [75], Barcelona [76] 
[El Sec. 6.2, Figs. 36, 37], Milan [77] [121 Sec. 6.3, Figs. 39, 40] (cf. [10]), Amsterdam 
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[121 p. 78] [78], West Berlin [Hp. 80], Moscow (the city and the oblast as a unit) [8] 
[L2"| Sec. 5.1 and Figs. 6, 7], Manila (coupled with suburban Rizal) [7j5], Bucharest [80], 
Ile-de- Montreal [TJl p. 87], Zurich, Santiago, Tunis and Istanbul [9] were-among others- 
highlighted in the respective dendrograms for their nations [T2j Sec. 8.2] [601 PP- 181-182] 
[571 p. 55]. In the 1965-70 intercounty analysis for the US, the most cosmopolitan entities 
were: (1) the central-located paired Illinois counties of Cook (Chicago) and neighboring, 
suburban DuPage; (2) the nation's capital, Washington, D. C; and (3) the paired South 
Florida (retirement) counties of Dade (Miami) and Broward (Ft. Lauderdale) jJU EH1 EI] • 
In general, counties with large military installations, large college populations or state cap- 
itals also interacted broadly with other areas [HI p. 153]. Application of the two-stage 
methodology to 1965-66 London inter-borough migration [78] indicated that the three inner 
boroughs of Kensington and Chelsea, Westminster, and Hammersmith acted-as a unit-in a 
cosmopolitan manner [121 Sec. 5.2, Fig. 10]. (In Sec. 8.2 and Table 16 of the anthology of 
results [12], additional geographic units and groups of units found to be cosmopolitan with 
regard to migration, are enumerated.) 

7. Clusters/regions obtained in two-stage internal migration analyses 

Geographically isolated (insular) areas-such as the Japanese islands of Kyushu and 
Shikoku [55] -emerged as well-defined clusters (regions) of their constituent (seven and four, 
respectively) subdivisions ("prefectures" in the Japanese case) in the dendrograms for the 
corresponding two-stage analyses, and similarly the Italian islands of Sicily and Sardinia 
[77], the North and South Islands of New Zealand, and the Canadian islands of Newfound- 
land and Prince Edward Island [TJl p. 90] (cf. [H21E3]). The eight counties of Connecticut, 
and other New England groupings, as further examples, to be elaborated upon below, were 
also very prominent in the highly disaggregated U. S. analysis [41] . Relatedly, in a study 
based solely upon the 1968 movement of college students among the fifty states, the six New 
England states were strongly clustered JHH Fig. 1]. Employing a 1963 Spanish interprovin- 
cial migration table, well-defined regions were formed by the two provinces of the Canary 
Islands, and the four provinces of Galicia [76] [TJl Sec. 6.2.1, Fig. 37] (cf. [S3]). The south- 
ernmost Indian states of Kerala and Madras (now Tamil Nadu) were strongly paired on 
the basis of 1961 interstate flows [EH]. A detailed comparison between functional migration 
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regions found by the two-stage procedure and those actually employed for administrative, 
political purposes in the corresponding nations is given in Sec. 8.1 and Table 15 of [12J. (In 
the 1995-2000 U. S. analysis at hand, particularly distinct large multicounty migration re- 
gions, well describable as "French Louisiana", "Northern Lower Michigan", "Northern New 
England", "South Jersey"... were found [HI Tables I, II].) 

In a 1989 monograph, Gawryszewski [351 EE] attempts to regionalize-presenting numerous 
dendrograms-the voivodships (provinces) of Poland on the basis of (total, rural-to-urban, 
and urban-to-urban) internal migration in the 1952-83 period, using the two-stage algorithm. 

It should be noted that it is rare that the two-stage methodology yields a migration region 
composed of two or more noncontiguous subregions-even though no contiguity information, 
of course, is explicitly present in the flow table nor provided to the algorithm (cf. [281 187]). A 
notable exception-comprehensible in terms of regional disparities in wealth, however-to this 
(topological!) rule was the uniting of the northern Italian region of Piemonte-the location 
of industrial Turin, where Fiat is based-with (poor) southern regions, before joining with 
central regions, in an aggregate 18-region 1955-70 study [10J [12, p. 75] (cf. [77] and [UJ p. 
26]). 



IV. SUMMARY 



As a final commentary, let us contrast the bistochastic and disparity filters in the fol- 
lowing manner: in the bistochastic approach, the network flow matrix is converted into 
a single bistochastic (doubly-stochastic) matrix, while in the disparity approach of SBV, 
in the general asymmetric case, the network flow matrix is converted into two stochastic 
matrices (row sums being normalized to 1, or column sums). Both approaches, then, use 
these associated matrices for the construction of network backbones. In the bistochastic 
approach, the entries of the associated matrix are themselves employed as linkage values, 
while in the disparity approach, the matrix entries are mapped to significance levels, which 
are then used (applying either AND or OR rules-a perhaps somewhat arbitrary feature) 
for backbone construction. Both approaches are invariant under transposition of the origi- 
nal flow matrix, and/or multiplication of rows or columns by scaling factors. (This seems 
somewhat surprising, since one might expect the statistical significance levels [a's] in the 
disparity filter to change with sample size [such as the total immigrants to an area] (cf. [71 p. 
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2]). Analyses involving either or both procedures should make explicit whether the presence 
of zero flows is considered to be due to structural or sampling considerations.) 

The observation reported above that far fewer links (25,329 vs. more than 80,203 (cf. 
[Fig. |l4])) are needed to construct a strongly connected network backbone in the bistochas- 
tic case than in the disparity analysis, would seem to argue in favor of the bistochastic 
filtering methodology. Also supportive of such a finding, is the fact that the correlations 
of the logs of the 735,531 nonzero raw (unadjusted) intercounty migration flows with the 
logs of the corresponding bistochasticized links is 0.318379, but with the disparity filter 
significance levels, considerably stronger in nature, that is, -0.849757 (using the AND rule) 
and -0.886816 (OR rule). Thus, in terms of the criteria employed by Serrano, Boguna and 
Vespignani themselves [1], it appears that the bistochastic filter, desirably, "belittles" small 
flows (and nodes) less so than does the SBV disparity filter. Nevertheless, much more de- 
tailed comparative studies, in particular as to "topological" properties, as called for by SBV, 
are certainly in order. (One important topological property is that of strong connectedness, 
upon which the second stage of the two-stage algorithm [T2] is based. Another is the pro- 
nounced rarity-even in the absence of imposition of contiguity constraints-of noncontiguous 



couplings observed [sec. Ill B 7 in the numerous two-stage internal migration analyses of 
various nations that have been conducted. This phenomenon is well exhibited in the 3,000+ 
U. S. county dendrogram in the supplementary material.) 
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